Add NPS benchmark for search speed regression testing by luccabb · Pull Request #46 · luccabb/moonfish

luccabb · 2026-02-15T07:19:41Z

Summary

Add a node counter (self.nodes) to AlphaBeta, incremented in negamax() and quiescence_search()
Add moonfish/bench.py with 48 positions from Stockfish's bench suite and a run_bench() function that reports per-position and total nodes, time, and NPS
Add --mode bench to the CLI (moonfish --mode bench)
Add CI workflow (.github/workflows/nps-benchmark.yml) that runs on PRs and posts results as a PR comment

Node count is deterministic (RNG is seeded) and serves as the primary signal — if it changes, the PR changed search behavior. NPS is informational only since CI runner performance varies.

Test plan

moonfish --mode bench runs all 48 positions and prints NPS results
Running twice produces identical node counts (543,813 at depth 3)
Existing test_alpha_beta tests pass (node counter doesn't break anything)

Add a node counter to AlphaBeta and a bench mode that searches 48 positions from Stockfish's bench suite, reporting per-position and total nodes, time, and NPS. Node count is deterministic and serves as the primary signal for detecting search behavior changes. Includes a CI workflow that runs on PRs and posts results as a comment.

greptile-apps · 2026-02-15T07:23:17Z

Greptile Summary

Adds a deterministic NPS benchmark suite for regression testing search speed. A self.nodes counter is added to AlphaBeta, incremented in negamax() and quiescence_search(), and reset per search_move() call. A new bench.py module searches 48 Stockfish bench positions with a seeded RNG for reproducible node counts. The CLI gains --mode bench and a CI workflow posts benchmark results as PR comments.

The --depth CLI flag is silently ignored when running bench mode — run_bench(depth=5) is hardcoded in main.py:18 instead of using config.negamax_depth
Node counting in alpha_beta.py is minimal and correctly placed; it does not affect search behavior or existing tests
CI workflow has appropriate contents: read and pull-requests: write permissions and only triggers on engine code changes

Confidence Score: 4/5

This PR is safe to merge with one minor fix needed for the hardcoded bench depth.
The core engine change (node counter) is minimal and correct. The bench module is well-structured with deterministic seeding. The only issue is the hardcoded depth in main.py which silently ignores the CLI flag — a straightforward fix but worth addressing before merge.
moonfish/main.py — hardcoded depth ignores CLI --depth parameter

Important Files Changed

Filename	Overview
moonfish/engines/alpha_beta.py	Adds `self.nodes` counter initialized to 0, incremented in both `negamax()` and `quiescence_search()`, and reset in `search_move()`. Clean, minimal change with no impact on search behavior.
moonfish/bench.py	New benchmark module with 48 Stockfish bench positions. Seeds RNG for deterministic node counts. Correctly handles terminal positions and reports per-position and total NPS.
moonfish/main.py	Adds `bench` mode to CLI. However, `run_bench(depth=5)` hardcodes the depth, silently ignoring the `--depth` CLI parameter.
.github/workflows/nps-benchmark.yml	CI workflow runs bench on PRs that touch engine code, parses output, and posts results as a PR comment. Permissions are appropriately scoped. Output parsing relies on consistent print format from bench.py.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (main.py)
    participant Bench as bench.py
    participant Engine as AlphaBeta
    participant QSearch as quiescence_search

    CLI->>Bench: run_bench(depth)
    Bench->>Bench: random.seed(0)
    loop 48 positions
        Bench->>Bench: _make_board(position)
        Bench->>Engine: search_move(board)
        Engine->>Engine: self.nodes = 0
        Engine->>Engine: negamax(board, depth)
        Engine->>Engine: self.nodes += 1
        Engine->>QSearch: quiescence_search(board)
        QSearch->>QSearch: self.nodes += 1
        Engine-->>Bench: best_move
        Bench->>Bench: read engine.nodes, accumulate totals
    end
    Bench->>CLI: print NPS results

_{Last reviewed commit: c5bbecd}

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-15T07:23:20Z

moonfish/main.py

+    elif config.mode == "bench":
+        run_bench(depth=5)


--depth CLI flag silently ignored in bench mode

run_bench(depth=5) hardcodes depth to 5, ignoring the --depth value passed via the CLI and stored in config.negamax_depth. A user running moonfish --mode bench --depth 3 would still get depth 5.

The CI workflow also passes --depth 5 which currently has no effect since the value is hardcoded here.

Suggested change

elif config.mode == "bench":

run_bench(depth=5)

elif config.mode == "bench":

run_bench(depth=config.negamax_depth)

github-actions · 2026-02-15T08:41:21Z

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric	Wins	Losses	Draws	Total	Win %
Overall	20	74	6	100	20.0%
As White	10	36	4	50	20.0%
As Black	10	38	2	50	20.0%

Non-checkmate endings:

Draw by 3-fold repetition: 5

vs Stockfish Skill Level 4

Metric	Wins	Losses	Draws	Total	Win %
Overall	20	73	7	100	20.0%
As White	13	33	4	50	26.0%
As Black	7	40	3	50	14.0%

Non-checkmate endings:

Draw by 3-fold repetition: 6
Draw by fifty moves rule: 1

vs Stockfish Skill Level 5

Metric	Wins	Losses	Draws	Total	Win %
Overall	8	85	7	100	8.0%
As White	4	41	5	50	8.0%
As Black	4	44	2	50	8.0%

Non-checkmate endings:

Draw by 3-fold repetition: 7

Configuration

5 chunks × 20 rounds × 3 skill levels = 300 total games
Each opening played with colors reversed (-repeat) for fairness
Moonfish: 60s per move
Stockfish: 60+5 time control

github-actions · 2026-02-15T08:41:47Z

⚡ NPS Benchmark Results

Metric	Value
Depth	5
Positions	48
Total nodes	21939310
Total time	4904.51s
Nodes/second	4473

Node count is the primary signal — it's deterministic and catches search behavior changes. If the node count changes, the PR changed search behavior. NPS is informational only (CI runner performance varies).

Per-position breakdown

Position  1/48: nodes=155456     time=31.17s  nps=4986
Position  2/48: nodes=762397     time=176.60s  nps=4316
Position  3/48: nodes=9587       time=1.32s  nps=7289
Position  4/48: nodes=857677     time=192.73s  nps=4450
Position  5/48: nodes=78423      time=18.52s  nps=4235
Position  6/48: nodes=519814     time=121.44s  nps=4280
Position  7/48: nodes=354259     time=83.56s  nps=4239
Position  8/48: nodes=387596     time=75.70s  nps=5120
Position  9/48: nodes=1652736    time=336.43s  nps=4912
Position 10/48: nodes=472629     time=92.16s  nps=5128
Position 11/48: nodes=516877     time=128.93s  nps=4008
Position 12/48: nodes=960140     time=238.87s  nps=4019
Position 13/48: nodes=618484     time=140.97s  nps=4387
Position 14/48: nodes=700607     time=147.29s  nps=4756
Position 15/48: nodes=654854     time=134.16s  nps=4880
Position 16/48: nodes=261335     time=49.35s  nps=5295
Position 17/48: nodes=17256      time=2.63s  nps=6572
Position 18/48: nodes=12611      time=1.53s  nps=8237
Position 19/48: nodes=38487      time=5.87s  nps=6552
Position 20/48: nodes=86927      time=12.06s  nps=7208
Position 21/48: nodes=16944      time=2.41s  nps=7021
Position 22/48: nodes=475        time=0.06s  nps=8246
Position 23/48: nodes=10664      time=1.41s  nps=7585
Position 24/48: nodes=33008      time=5.35s  nps=6165
Position 25/48: nodes=10136      time=1.46s  nps=6936
Position 26/48: nodes=79572      time=13.09s  nps=6076
Position 27/48: nodes=82542      time=11.61s  nps=7107
Position 28/48: nodes=308023     time=56.57s  nps=5444
Position 29/48: nodes=231702     time=50.67s  nps=4572
Position 30/48: nodes=2547       time=0.38s  nps=6749
Position 31/48: nodes=1474637    time=300.61s  nps=4905
Position 32/48: nodes=727292     time=145.69s  nps=4992
Position 33/48: nodes=2470627    time=790.65s  nps=3124
Position 34/48: nodes=1291369    time=339.29s  nps=3806
Position 35/48: nodes=557752     time=105.16s  nps=5303
Position 36/48: nodes=1931624    time=405.97s  nps=4758
Position 37/48: nodes=1551790    time=305.94s  nps=5072
Position 38/48: nodes=14491      time=1.54s  nps=9413
Position 39/48: nodes=5184       time=0.50s  nps=10269
Position 40/48: nodes=22316      time=0.95s  nps=23529
Position 41/48: nodes=131447     time=18.36s  nps=7158
Position 42/48: nodes=83030      time=12.15s  nps=6831
Position 43/48: nodes=23479      time=3.00s  nps=7815
Position 44/48: nodes=102571     time=16.88s  nps=6077
Position 45/48: nodes=45937      time=8.24s  nps=5577
Position 46/48: nodes=1611999    time=315.27s  nps=5113
Position 47/48: nodes=0          time=0.00s  nps=0  (terminal)
Position 48/48: nodes=0          time=0.00s  nps=0  (terminal)

github-actions · 2026-02-15T08:43:10Z

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric	Wins	Losses	Draws	Total	Win %
Overall	30	63	7	100	30.0%
As White	18	28	4	50	36.0%
As Black	12	35	3	50	24.0%

Non-checkmate endings:

Draw by 3-fold repetition: 7

vs Stockfish Skill Level 4

Metric	Wins	Losses	Draws	Total	Win %
Overall	22	68	10	100	22.0%
As White	15	31	4	50	30.0%
As Black	7	37	6	50	14.0%

Non-checkmate endings:

Draw by 3-fold repetition: 9

vs Stockfish Skill Level 5

Metric	Wins	Losses	Draws	Total	Win %
Overall	8	87	5	100	8.0%
As White	6	42	2	50	12.0%
As Black	2	45	3	50	4.0%

Non-checkmate endings:

Draw by 3-fold repetition: 5

Configuration

5 chunks × 20 rounds × 3 skill levels = 300 total games
Each opening played with colors reversed (-repeat) for fairness
Moonfish: 60s per move
Stockfish: 60+5 time control

github-actions · 2026-02-15T08:43:44Z

⚡ NPS Benchmark Results

Metric	Value
Depth	5
Positions	48
Total nodes	21939310
Total time	4887.20s
Nodes/second	4489

Node count is the primary signal — it's deterministic and catches search behavior changes. If the node count changes, the PR changed search behavior. NPS is informational only (CI runner performance varies).

Per-position breakdown

Position  1/48: nodes=155456     time=31.03s  nps=5009
Position  2/48: nodes=762397     time=175.75s  nps=4337
Position  3/48: nodes=9587       time=1.29s  nps=7427
Position  4/48: nodes=857677     time=190.96s  nps=4491
Position  5/48: nodes=78423      time=18.40s  nps=4261
Position  6/48: nodes=519814     time=121.26s  nps=4286
Position  7/48: nodes=354259     time=83.37s  nps=4249
Position  8/48: nodes=387596     time=75.58s  nps=5128
Position  9/48: nodes=1652736    time=336.27s  nps=4914
Position 10/48: nodes=472629     time=91.50s  nps=5165
Position 11/48: nodes=516877     time=127.83s  nps=4043
Position 12/48: nodes=960140     time=240.63s  nps=3990
Position 13/48: nodes=618484     time=140.80s  nps=4392
Position 14/48: nodes=700607     time=147.77s  nps=4741
Position 15/48: nodes=654854     time=133.95s  nps=4888
Position 16/48: nodes=261335     time=49.35s  nps=5295
Position 17/48: nodes=17256      time=2.61s  nps=6622
Position 18/48: nodes=12611      time=1.52s  nps=8269
Position 19/48: nodes=38487      time=5.85s  nps=6581
Position 20/48: nodes=86927      time=11.91s  nps=7296
Position 21/48: nodes=16944      time=2.39s  nps=7076
Position 22/48: nodes=475        time=0.06s  nps=8345
Position 23/48: nodes=10664      time=1.39s  nps=7652
Position 24/48: nodes=33008      time=5.28s  nps=6248
Position 25/48: nodes=10136      time=1.44s  nps=7035
Position 26/48: nodes=79572      time=12.87s  nps=6182
Position 27/48: nodes=82542      time=11.50s  nps=7178
Position 28/48: nodes=308023     time=55.80s  nps=5520
Position 29/48: nodes=231702     time=50.02s  nps=4631
Position 30/48: nodes=2547       time=0.37s  nps=6836
Position 31/48: nodes=1474637    time=299.02s  nps=4931
Position 32/48: nodes=727292     time=144.86s  nps=5020
Position 33/48: nodes=2470627    time=788.27s  nps=3134
Position 34/48: nodes=1291369    time=336.75s  nps=3834
Position 35/48: nodes=557752     time=104.88s  nps=5318
Position 36/48: nodes=1931624    time=404.74s  nps=4772
Position 37/48: nodes=1551790    time=304.48s  nps=5096
Position 38/48: nodes=14491      time=1.50s  nps=9650
Position 39/48: nodes=5184       time=0.49s  nps=10475
Position 40/48: nodes=22316      time=0.93s  nps=23892
Position 41/48: nodes=131447     time=18.12s  nps=7253
Position 42/48: nodes=83030      time=12.08s  nps=6874
Position 43/48: nodes=23479      time=2.98s  nps=7878
Position 44/48: nodes=102571     time=16.73s  nps=6129
Position 45/48: nodes=45937      time=8.18s  nps=5612
Position 46/48: nodes=1611999    time=314.38s  nps=5127
Position 47/48: nodes=0          time=0.00s  nps=0  (terminal)
Position 48/48: nodes=0          time=0.00s  nps=0  (terminal)

github-actions · 2026-02-15T09:12:44Z

⚡ NPS Benchmark Results

Metric	Value
Depth	5
Positions	48
Total nodes	21939310
Total time	4525.11s
Nodes/second	4848

Node count is the primary signal — it's deterministic and catches search behavior changes. If the node count changes, the PR changed search behavior. NPS is informational only (CI runner performance varies).

Per-position breakdown

Position  1/48: nodes=155456     time=28.91s  nps=5377
Position  2/48: nodes=762397     time=163.94s  nps=4650
Position  3/48: nodes=9587       time=1.22s  nps=7859
Position  4/48: nodes=857677     time=180.13s  nps=4761
Position  5/48: nodes=78423      time=17.31s  nps=4530
Position  6/48: nodes=519814     time=113.04s  nps=4598
Position  7/48: nodes=354259     time=76.97s  nps=4602
Position  8/48: nodes=387596     time=70.23s  nps=5519
Position  9/48: nodes=1652736    time=314.34s  nps=5257
Position 10/48: nodes=472629     time=85.60s  nps=5521
Position 11/48: nodes=516877     time=118.91s  nps=4346
Position 12/48: nodes=960140     time=220.47s  nps=4355
Position 13/48: nodes=618484     time=128.62s  nps=4808
Position 14/48: nodes=700607     time=135.70s  nps=5162
Position 15/48: nodes=654854     time=122.44s  nps=5348
Position 16/48: nodes=261335     time=45.55s  nps=5737
Position 17/48: nodes=17256      time=2.40s  nps=7202
Position 18/48: nodes=12611      time=1.40s  nps=9021
Position 19/48: nodes=38487      time=5.35s  nps=7195
Position 20/48: nodes=86927      time=11.00s  nps=7904
Position 21/48: nodes=16944      time=2.22s  nps=7634
Position 22/48: nodes=475        time=0.05s  nps=9062
Position 23/48: nodes=10664      time=1.29s  nps=8249
Position 24/48: nodes=33008      time=4.88s  nps=6766
Position 25/48: nodes=10136      time=1.33s  nps=7626
Position 26/48: nodes=79572      time=11.86s  nps=6708
Position 27/48: nodes=82542      time=10.56s  nps=7820
Position 28/48: nodes=308023     time=51.13s  nps=6024
Position 29/48: nodes=231702     time=45.80s  nps=5059
Position 30/48: nodes=2547       time=0.34s  nps=7418
Position 31/48: nodes=1474637    time=274.78s  nps=5366
Position 32/48: nodes=727292     time=133.47s  nps=5449
Position 33/48: nodes=2470627    time=721.64s  nps=3423
Position 34/48: nodes=1291369    time=312.35s  nps=4134
Position 35/48: nodes=557752     time=97.66s  nps=5711
Position 36/48: nodes=1931624    time=378.25s  nps=5106
Position 37/48: nodes=1551790    time=284.37s  nps=5456
Position 38/48: nodes=14491      time=1.39s  nps=10418
Position 39/48: nodes=5184       time=0.46s  nps=11381
Position 40/48: nodes=22316      time=0.87s  nps=25674
Position 41/48: nodes=131447     time=16.84s  nps=7807
Position 42/48: nodes=83030      time=11.15s  nps=7447
Position 43/48: nodes=23479      time=2.75s  nps=8541
Position 44/48: nodes=102571     time=15.43s  nps=6646
Position 45/48: nodes=45937      time=7.56s  nps=6074
Position 46/48: nodes=1611999    time=293.18s  nps=5498
Position 47/48: nodes=0          time=0.00s  nps=0  (terminal)
Position 48/48: nodes=0          time=0.00s  nps=0  (terminal)

github-actions · 2026-02-15T09:25:27Z

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric	Wins	Losses	Draws	Total	Win %
Overall	37	54	9	100	37.0%
As White	20	23	7	50	40.0%
As Black	17	31	2	50	34.0%

Non-checkmate endings:

Draw by 3-fold repetition: 8
Draw by insufficient mating material: 1

vs Stockfish Skill Level 4

Metric	Wins	Losses	Draws	Total	Win %
Overall	18	77	5	100	18.0%
As White	11	35	4	50	22.0%
As Black	7	42	1	50	14.0%

Non-checkmate endings:

Draw by 3-fold repetition: 4

vs Stockfish Skill Level 5

Metric	Wins	Losses	Draws	Total	Win %
Overall	4	89	7	100	4.0%
As White	1	45	4	50	2.0%
As Black	3	44	3	50	6.0%

Non-checkmate endings:

Draw by 3-fold repetition: 7

Configuration

5 chunks × 20 rounds × 3 skill levels = 300 total games
Each opening played with colors reversed (-repeat) for fairness
Moonfish: 60s per move
Stockfish: 60+5 time control

luccabb added 2 commits February 14, 2026 23:19

Add source link for Stockfish bench positions

c5bbecd

greptile-apps bot reviewed Feb 15, 2026

View reviewed changes

Fix formatting and import sorting

40ad1d7

Merge master into add-nps-benchmark

83a8441

luccabb merged commit 6172f0e into master Feb 16, 2026
10 checks passed

luccabb deleted the add-nps-benchmark branch February 16, 2026 05:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NPS benchmark for search speed regression testing#46

Add NPS benchmark for search speed regression testing#46
luccabb merged 4 commits intomasterfrom
add-nps-benchmark

luccabb commented Feb 15, 2026

Uh oh!

greptile-apps bot commented Feb 15, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luccabb commented Feb 15, 2026

Summary

Test plan

Uh oh!

greptile-apps bot commented Feb 15, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 15, 2026

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

vs Stockfish Skill Level 4

vs Stockfish Skill Level 5

Uh oh!

github-actions bot commented Feb 15, 2026

⚡ NPS Benchmark Results

Uh oh!

github-actions bot commented Feb 15, 2026

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

vs Stockfish Skill Level 4

vs Stockfish Skill Level 5

Uh oh!

github-actions bot commented Feb 15, 2026

⚡ NPS Benchmark Results

Uh oh!

github-actions bot commented Feb 15, 2026

⚡ NPS Benchmark Results

Uh oh!

github-actions bot commented Feb 15, 2026

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

vs Stockfish Skill Level 4

vs Stockfish Skill Level 5

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant